MCP Server
This document explains the Model Context Protocol (MCP) Server implementation in the Agentic Browser project. It covers how the MCP server exposes tools to AI models via stdio, how tools are defined and registered, and how requests and responses are handled. It also documents the communication patterns between the MCP server, the browser extension, and AI agents, along with prompt engineering aspects, configuration management, tool lifecycle, error handling, and security/performance considerations.
The MCP server lives under a dedicated module and integrates with core LLM capabilities, prompt chains, and website context extraction tools. The broader system includes an agent framework and a browser extension that communicates with a backend via WebSocket.
Diagram sources
Section sources
MCP Server: Defines and registers tools, handles tool invocations, and streams responses back to the AI model over stdio.
LLM Provider Abstraction: Centralizes provider selection, model instantiation, and text generation.
Prompt Chains: Provide domain-specific prompting for GitHub repositories and other domains.
Website Context Tools: Fetch and convert website content to markdown for downstream consumption.
Agent Framework: Provides a structured agent with tool execution and state management.
Extension: Implements background and content scripts and a WebSocket client for agent orchestration.
Section sources
The MCP server runs as a standalone process communicating over stdio. AI models request tools via MCP, and the server executes them, returning textual content. The browser extension coordinates agent actions and can communicate with a backend via WebSocket. The agent framework orchestrates tool use and manages conversational state.
Diagram sources
MCP Server: Tool Definition and Invocation#
Tool Registration: The server registers four tools via a decorator that returns a list of tool descriptors with names, descriptions, and JSON Schemas for inputs.
Tool Execution: The server routes tool calls to specific handlers, instantiating providers or invoking utility functions, and returns TextContent responses.
Generate text"] Route --> |github.answer| GH["Invoke GitHub processor"] Route --> |website.fetch_markdown| WM["Fetch markdown via Jina"] Route --> |website.html_to_md| HM["Convert HTML to markdown"] Route --> |other| Err["Return error or unknown tool"] Gen --> Return["Return TextContent"] GH --> Return WM --> Return HM --> Return Err --> Return
Diagram sources
Section sources
LLM Provider Abstraction#
Provider Configurations: Centralized mapping of providers to underlying SDK clients, default models, and parameter mappings.
Initialization: Validates API keys and base URLs, constructs the appropriate client, and raises descriptive errors on misconfiguration.
Generation: Accepts optional system messages and returns generated text.
Diagram sources
Section sources
Prompt Engineering for GitHub Tools#
Prompt Template: System and user prompt template designed to constrain responses to repository context.
Runnable Chain: Composes inputs (tree, summary, content, question, chat history) with a formatter and an LLM client to produce a final answer.
Diagram sources
Section sources
Website Context Tools#
Fetch Markdown: Uses an external service to retrieve markdown content for a given URL.
HTML to Markdown: Converts raw HTML to markdown using parsing utilities.
Diagram sources
Section sources
Agent Framework and Tool Lifecycle#
Agent State: Maintains conversation context and supports tool calls with structured messages.
ToolNode Integration: Tools are structured and invoked by the agent’s tool node, enabling conditional routing between agent reasoning and tool execution.
Tool Definitions: Rich tool schemas and coroutines encapsulate domain-specific capabilities.
Diagram sources
Section sources
Browser Extension Communication Patterns#
Background Script: Handles messaging for agent tool execution, tab management, and action dispatch.
Content Script: Provides lightweight DOM-aware actions and can be extended for richer interactions.
WebSocket Client: Manages connection to a backend, emits progress events, and supports agent execution commands.
Diagram sources
Section sources
MCP Server depends on:
LLM provider abstraction for text generation
Prompt chains for contextual QA
Website context tools for content retrieval/conversion
Agent framework depends on:
Structured tools and prompt chains
LLM provider for reasoning
Extension depends on:
Background and content scripts for browser automation
WebSocket client for backend coordination
Diagram sources
Section sources
Asynchronous Execution: MCP tool handlers and agent workflows leverage async patterns to avoid blocking.
Threading for Blocking IO: Agent tools use threads for blocking operations (e.g., HTTP requests, file reads) to keep the event loop responsive.
Caching: Agent graph compilation is cached to reduce startup overhead.
Provider Selection: LLM initialization validates environment variables early to fail fast and avoid runtime retries.
[No sources needed since this section provides general guidance]
MCP Tool Not Found: Ensure the requested tool name matches the registered tool names and schemas.
LLM Initialization Failures: Verify provider configuration, API keys, and base URLs. The LLM provider raises explicit errors when required environment variables are missing.
GitHub Processor Errors: Confirm that the prompt chain receives all required inputs and that the LLM client is reachable.
Website Tools Failures: Check network connectivity to the external markdown service and input URL validity.
Extension WebSocket Issues: Validate backend URL and network connectivity; the WebSocket client logs connection events and errors.
Section sources
The MCP Server provides a focused, extensible interface for exposing tools to AI models. By centralizing LLM providers, prompt engineering, and content extraction utilities, it enables secure, structured interactions between AI agents and browser automation. The agent framework and extension components complement the MCP server to deliver a cohesive agentic browser experience.
[No sources needed since this section summarizes without analyzing specific files]
Configuration Management#
Environment Variables: API keys and base URLs are resolved from environment variables per provider configuration.
Logging: Centralized logging configuration supports development and production environments.
Section sources
Security Considerations#
API Keys and Base URLs: Providers requiring secrets rely on environment variables; avoid embedding credentials in code.
Tool Inputs: MCP tool schemas define required fields and types to reduce injection risks.
Extension Permissions: Background and content scripts should limit permissions to those required for automation.
Section sources
Example Tool Implementations and Client Integration#
MCP Tool Implementations: See the tool registration and call handlers for patterns to add new tools.
Client Integration: Use the WebSocket client to integrate agent execution and progress reporting with the extension.
Section sources